Arabic POS Tagging: Don't Abandon Feature Engineering Just Yet
نویسندگان
چکیده
This paper focuses on comparing between using Support Vector Machine based ranking (SVMRank) and Bidirectional LongShort-Term-Memory (bi-LSTM) neuralnetwork based sequence labeling in building a state-of-the-art Arabic part-ofspeech tagging system. Using SVMRank leads to state-of-the-art results, but with a fair amount of feature engineering. Using bi-LSTM, particularly when combined with word embeddings, may lead to competitive POS-tagging results by automatically deducing latent linguistic features. However, we show that augmenting biLSTM sequence labeling with some of the features that we used for the SVMRankbased tagger yields to further improvements. We also show that gains realized using embeddings may not be additive with the gains achieved due to features. We are open-sourcing both the SVMRank and the bi-LSTM based systems for the research community.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملArabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination
Part Of Speech (POS) tagging forms the important preprocessing step in many of the natural language processing applications such as text summarization, question answering and information retrieval system. It is the process of classifying every word in a given context to its appropriate part of speech. Different POS tagging techniques in the literature have been developed and experimented. Curre...
متن کاملUsing Stem-Templates to Improve Arabic POS and Gender/Number Tagging
This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature...
متن کاملExploiting Wiktionary for Lightweight Part-of-Speech Tagging for Machine Learning Tasks
Part-of-speech (PoS) tagging is a crucial part in many natural language machine learning tasks. Current state-ofthe-art PoS taggers exhibit excellent qualitative performance, but also contribute heavily to the total runtime of text preprocessing and feature generation, which makes feature engineering a timeconsuming task. We propose a lightweight dictionary and heuristics based PoS tagger that ...
متن کاملDeep Learning for Chinese Word Segmentation and POS Tagging
This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017